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Abstract 

Background: The dynamics of biochemical reaction systems are constrained by the fundamental laws of 
thermodynamics, which impose well-defined relationships among the reaction rate constants characterizing these 
systems. Constructing biochemical reaction systems from experimental observations often leads to parameter 
values that do not satisfy the necessary thermodynamic constraints. This can result in models that are not 
physically realizable and may lead to inaccurate, or even erroneous, descriptions of cellular function. 

Results: We introduce a thermodynamically consistent model calibration (JCMC) method that can be effectively 
used to provide thermodynamically feasible values for the parameters of an open biochemical reaction system. The 
proposed method formulates the model calibration problem as a constrained optimization problem that takes 
thermodynamic constraints (and, if desired, additional non-thermodynamic constraints) into account. By calculating 
thermodynamically feasible values for the kinetic parameters of a well-known model of the EGF/ERK signaling 
cascade, we demonstrate the qualitative and quantitative significance of imposing thermodynamic constraints on 
these parameters and the effectiveness of our method for accomplishing this important task. MATLAB software, 
using the Systems Biology Toolbox 2.1, can be accessed from http://www.cis.jhu.edu/~goutsias/CSS lab/software, 
html. An SBML file containing the thermodynamically feasible EGF/ERK signaling cascade model can be found in 
the BioModels database. 

Conclusions: TCMC is a simple and flexible method for obtaining physically plausible values for the kinetic 
parameters of open biochemical reaction systems. It can be effectively used to recalculate a thermodynamically 
consistent set of parameter values for existing thermodynamically infeasible biochemical reaction models of cellular 
function as well as to estimate thermodynamically feasible values for the parameters of new models. Furthermore, 
TCMC can provide dimensionality reduction, better estimation performance, and lower computational complexity, 
and can help to alleviate the problem of data overfitting. 



Background 

Physical systems are constrained to operate according to 
the fundamental laws of thermodynamics. The conserva- 
tion of mass and energy and the production of entropy 
(or heat dissipation) dictate that certain events are phy- 
sically impossible. A broken glass, for example, will not 
spontaneously reassemble, and a bar of gold will not for- 
tuitously appear from thin air. Not all physical con- 
straints imposed by thermodynamics are intuitively 
obvious. As a matter of fact, thermodynamic constraints 
imposed on biochemical reaction systems are routinely 
overlooked in the literature, either due to ignorance of 
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their existence or difficulties in understanding the impli- 
cations of modern non-equilibrium thermodynamics. 
There is an increasing consensus, however, that care 
must be taken to ensure that the kinetic parameters of a 
biochemical reaction system meet these thermodynamic 
constraints [1-6]. 

There are many publications discussing the problem 
of estimating the kinetic parameters of a biochemical 
reaction system from experimental data of molecular 
concentrations, when the underlying stoichiometry is 
known [7-9]. Essentially, all approaches to this problem, 
which is often referred to as model calibration, are 
based on deriving a cost function and choosing an opti- 
mization algorithm to minimize that function [10]. The 
cost function provides a measure of 'goodness of fit' of 
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the estimated biochemical reaction system dynamics to 
available observations, and may be designed by a variety 
of statistical inference techniques, such as maximum like- 
lihood, Bayesian inference, etc., or by simply employing 
an appropriate distance metric, such as least-squares. The 
optimization procedure used must be characterized by 
superior performance for finding global minima, due to 
the highly non-convex and multi-modal nature of the 
cost function [10]. The vast majority of published model 
calibration methods, however, produce biochemical reac- 
tion systems that may not be physically realizable, since 
they do not take into account the fact that the underlying 
kinetic parameters may be constrained by the fundamen- 
tal laws of thermodynamics. 

Recently, there have been several attempts to address 
the issue of thermodynamic constraints in chemical 
kinetics [2-5]. Among proposed methods, the thermody- 
namic-kinetic modeling (TKM) approach [5] enjoys 
some benefits over other techniques. However, we have 
previously noted in [11] that this approach is unnecessa- 
rily complicated and can be cumbersome, especially 
when dealing with molecular perturbations (commonly 
used in systems biology) or when merging estimated 
TKM models [5]. 

To address these problems, we have recently proposed 
two techniques for estimating the kinetic parameters of 
closed biochemical reaction systems from available 
observations of molecular concentrations in a thermody- 
namically consistent fashion [11,12]. In [12], we model 
biochemical reaction systems by mass-action kinetics, 
use maximum-likelihood estimation, and employ a pro- 
jection step that allows us to appropriately choose 
kinetic parameter values so that the final system is ther- 
modynamically feasible. In [11], we employ a Bayesian 
inference approach, eliminate the projection step, 
and derive a biophysically based cost function over 
parameters that can be chosen independently without 
violating the underlying thermodynamic constraints. 
Unfortunately, both methods are limited, being applied 
to closed biochemical reaction systems using standard 
mass action kinetics. 

In this paper, we propose a method for calibrating the 
kinetic parameters of biochemical reaction models of 
cellular function so that the resulting systems are ther- 
modynamically feasible. The method, which we refer to 
as Thermo dynamically Consistent Model Calibration 
(TCMC), works with any iterative parameter estimation 
algorithm of choice and can be applied to open bio- 
chemical reaction systems, which, in problems of sys- 
tems biology, are more realistic than the closed systems 
we considered previously. Furthermore, TCMC is cap- 
able of handling any kinetic rate laws in ideal mixtures, 
such as non-mass action rate laws commonly used to 
describe complex enzymatic reaction schemes. 



We exemplify practical aspects of the proposed tech- 
nique by recalculating the kinetic parameters of a well- 
known model of the EGF/ERK signaling cascade [13], 
which is thermodynamically infeasible. This allows us to 
propose a thermodynamically feasible model for this 
important signaling pathway that is physically realizable 
and better matches available densitometric data. Com- 
puter simulations reveal a number of qualitative and 
quantitative differences of possible biological significance 
between the thermodynamically feasible and the ther- 
modynamically infeasible published model, which need 
to be validated experimentally. Moreover, we discuss a 
number of important advantages gained by TCMC over 
estimating the kinetic parameters using a collective fit- 
ting approach that does not consider the underlying 
thermodynamic constraints. Besides producing physi- 
cally realizable and thermodynamically consistent mod- 
els, TCMC may result in dimensionality reduction, 
better estimation performance, and lower computational 
complexity. MATLAB software, using the Systems Biol- 
ogy Toolbox 2.1 http://www.sbtoolbox2.org, can be 
accessed from http://www.cis.jhu.edu/~goutsias/CSSlab/ 
software.html. An SBML file containing the thermody- 
namically feasible EGF/ERK signaling cascade model can 
be found in the BioModels database http://www.ebi.ac. 
uk/biomodels-main. 

We believe that TCMC can be effectively used to 
recalculate the parameter values of any existing thermo- 
dynamically infeasible biochemical reaction model of 
cellular function, as well as to estimate the parameters 
of new biochemical reaction models from available 
experimental data, thus producing physically plausible 
versions of these models compatible with the fundamen- 
tal laws of thermodynamics. Finally, as more chemical 
species or reactions are discovered, TCMC can be used 
to easily extend existing models of cellular activity in a 
thermodynamically consistent and computationally effi- 
cient fashion. 

Results 

Biochemical Reaction Systems 

Most biological processes of interest to systems biology 
are modeled by means of open biochemical reaction sys- 
tems; i.e., systems that exchange mass with their sur- 
roundings [14]. Living cells, for example, are open 
systems maintaining themselves by exchanging materials 
with their environment. Mass exchange is modeled 
either explicitly or implicitly. Examples of explicit mod- 
eling include: clamped species, reactions with null spe- 
cies as reactants or products, and irreversible reactions 
[15]. Clamped species are chemicals whose concentra- 
tions are held fixed. They are often used to model mole- 
cular species whose concentrations are affected by 
unknown reactions. It is apparent that these chemicals 
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must be supplied or removed from the system at appro- 
priate rates to ensure that their concentrations do not 
deviate from their fixed values. On the other hand, reac- 
tions with null reactants or products model mass trans- 
fer in and out of the system, respectively. Finally, the 
use of an irreversible reaction is based on the assump- 
tion that the concentration of at least one of its pro- 
ducts is clamped to zero (otherwise, the reaction can be 
reversed at sufficiently high product concentrations), 
which implies mass transfer as well. An example of 
implicitly modeling mass exchange is a reaction with 
reactants or products not modeled by the system. A 
common case would be the phosphorylation of a protein 
without explicitly modeling conversion of ATP to ADP 
[5]. 

An open biochemical reaction system is comprised of 
N molecular species X lf X 2 ,..., X N that interact through 
M coupled reactions, given by 



neAf 



\X n 



neAf 



X n/ me M, 



where AT := {1, 2, . . . , N}, *M\ = {1, 2,..., M}, and v nm) 
v' nm > 0 are the stoichiometrics of the reactants and pro- 
ducts. We can characterize an open biochemical reac- 
tion system at time t > 0 by the concentrations 
{x n (t), n g A/"} of all molecular species at t. Clamped 
molecular species J\f c c M have concentrations that do 
not vary with time, whereas, the concentrations of the 
remaining 'dynamic' species Md = J\f\J\f c evolve as a 
function of time. We will assume that the system char- 
acterizes reactions in an ideal and well-stirred (homoge- 
neous) mixture at constant temperature and volume and 
that the concentrations {x n {t), n e Md} vary continu- 
ously in time. In this case, we can describe the dynamic 
evolution of the molecular concentrations in the system 
by the following chemical kinetic equations: 



dx n (t) 
dt 



meM 

o, 



(t, k), t eT,n e Md 
t eT,n eM c , 



(1) 



initialized by setting x n (0) = q n , n e J\T> for some initial 
concentrations q n , n e J\f> where (p m (t, k) is the net flux 
of the m th reaction, s nm := v' nm — v nm is the net stoichio- 
metry coefficient of the n th molecular species associated 
with the m th reaction, T := [0, f max ] is an observation 
time window of interest, and k is a vector of kinetic 
parameters {kj, j e where J = {1,2, ...,/}. The 
parameters k characterize the biochemical reaction sys- 
tem at hand and are independent of the molecular con- 
centrations {x n {t), n g A/"}. Moreover, we assume that 
these parameters do not vary with time. 

By appropriately pruning and modifying an open bio- 
chemical reaction system, we can derive a closed 



subsystem (i.e., a system that does not exchange mass 
with its surroundings) that lends itself to thermody- 
namic analysis since external or unknown thermody- 
namic forces no longer exist. To do so, we first remove 
all null and irreversible reactions, as well as partially 
modeled reactions (i.e., reactions with incomplete stoi- 
chiometrics). Next, we remove clamping of molecular 
species involved in reversible reactions. Subsequently, 
we keep only reactions that are thermodynamically inde- 
pendent. Thermodynamic independence among reac- 
tions means that a reaction is only driven by its own 
thermodynamic force, which implies that the affinity of 
the reaction will be zero if and only if there is no 
change in its degree of advancement. This condition is 
usually fulfilled if we keep only elementary reactions (i. 
e., reactions that take place in one single step). As a 
consequence, we obtain a closed reaction set <= <M. 
Finally, we remove all species that are no longer 
involved with any of the reactions in *Jt Q , leaving only 
the molecular species Mo c J\f associated with the 
closed subsystem. 

The main rationale behind the second step is that the 
kinetic parameters k considered in this paper are 
assumed to be independent of the molecular concentra- 
tions. As a consequence, the values of these parameters 
will not change if the concentrations of the clamped 
species are allowed to vary. Therefore, we can construct 
a (possibly artificial) situation in which the concentra- 
tions of the clamped species vary as if they were 
dynamic species. Because our goal in creating the closed 
subsystem is to discover and enforce thermodynamic 
constraints on the kinetic parameters, we must include 
the clamped species in our model. This is contrary to 
simply removing all clamped species and the associated 
reactions, since this approach will not allow us to deter- 
mine thermodynamically feasible values for the kinetic 
parameters of the removed reactions. 

The third step is due to a simplification imposed on 
us by the current state of non-equilibrium thermody- 
namics. Thermodynamically dependent reactions influ- 
ence each other, since the thermodynamic force of one 
reaction may drive the other reaction and vice versa. 
Unfortunately, it is not clear at this point how to deal 
with thermodynamically coupled reactions. Future 
research may be necessary to address this issue. 

The resulting closed biochemical reaction subsystem is 
comprised of N 0 molecular species {X nr n gJV 0 } that 
interact through M 0 coupled reversible reactions. The 
dynamic evolution of the molecular concentrations in 
this system is governed by: 



dx n {t) 
dt 



meA4o 



(t,k), t e T, neA/i, 



(2) 
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initialized by x n (0) = q w for n e A/"o- We will character- 
ize all reactions in ^J{ 0 by the generalized mass-action 
rate law [16]. In this case, the net fluxes are given by 



% {t, k) = f m [x{t), 7t] 
x (r 2m -i EI IxiW^-rim ft l x Mf im 

\ ieJ\f 0 ieAfo 



(3) 



for m e <JC 0i where r 2w _i, r 2m are the (generalized) 
rate constants of the m th forward and reverse reactions, 
respectively. The quantity f m [x(t), n] can be any positive 
and finite function of the concentrations x(t) and may 
depend on a set of kinetic parameters n. For usual mass 
action kinetics, f m [x(t), n] = 1. However, for more com- 
plex schemes, this function usually takes a rational or 
polynomial form. It is known that all reversible reaction 
rate laws in ideal mixtures (including reversible Michae- 
lis-Menten kinetics) can be described by (3) [5]. Note 
that k includes the kinetic parameters n as well as the 
rate constants {r 2m -i> r 2m> m ^o\- It also includes the 
kinetic parameters of all reactions in *y# e <J{ 0 . 

It is a direct consequence of thermodynamic analysis 
that a closed biochemical reaction system will asympto- 
tically reach a unique non-zero state {x n > 0, n e A/"o} 
of chemical equilibrium at which all concentrations 
become stationary (assuming, of course, we have non- 
zero initial conditions), satisfying the following detailed 
balance equations: 

Tlm-l ]"[ Xn™ = r 2m f~[ ^n™ , for all 171 G M 0 . ^ 



As a consequence, 



z m := In 



f2m-l 
r 2m 



s nm lnx n , for all m g Mq. 



neJ\f 0 



(5) 



These constraints must be satisfied by the rate con- 
stants in order for the closed biochemical reaction sys- 
tem to be thermodynamically feasible. 

The constraints implied by (5) correspond to the reac- 
tion cycles' in the system. A reaction cycle is comprised 
of those reactions corresponding to the nonzero elements 
of a vector in the null space null (So) of the stoichiometry 
matrix S 0 = [s nm , n e J\f 0f m e Mo] of the closed sys- 
tem. Clearly, (2) will be at a fixed point whenever the net 
fluxes of the underlying reactions are set equal to the 
corresponding elements of a vector in null (So). If we 
denote by s the No x 1 vector whose n th element is the 
log steady-state concentration In x n and by z the M 0 x 1 
vector whose m th element is the log equilibrium constant 
z m = ln(r 2m _i/r 2m ), then we can write (5) in a matrix- 
vector form as Sjs = z. Now, if b is a vector in the null 



space of S 0 , then S 0 b = 0 and b T z = f? T Sjs = (S 0 b) T s = 0, 
or 



n 

meMo 



Tlm-l 
Tim 



1, for all b g null(So), 



(6) 



which are the well-known Wegscheider conditions [6]. 
These conditions express necessary and sufficient con- 
straints on the reaction rate constants of a closed bio- 
chemical reaction system to be thermodynamically 
feasible. Thus, if we denote the set of all thermodynami- 
cally feasible parameters k by >V, then any k g W satis- 
fies (6) and, likewise, any k that satisfies (6) is a member 
ofW. 

We want to emphasize that, in open biochemical reac- 
tion systems, the rate constants of the reversible reac- 
tions must also be constrained by the Wegscheider 
conditions, even if the system is far from equilibrium. 
To identify these constraints, we need to prune an open 
biochemical reaction system into a closed subsystem, by 
employing the technique discussed previously, and use 
the resulting stoichiometry matrix So to calculate the 
constraints given by (6). 

An equally important observation is that the rate con- 
stants of the reactions pruned from an open system are 
not constrained by the Wegscheider conditions, since 
(4) must only be satisfied by the reactions in the closed 
subsystem. Furthermore, if a reaction m in a closed sys- 
tem is not part of a cycle, then b m = 0, for every b g 
null (So), and its forward and reverse rate constants will 
not be thermodynamically constrained, since these con- 
stants trivially satisfy (6). 

It can be shown (see Additional file 1) that the entropy 
production rate of an open biochemical reaction system 
at chemical equilibrium in which the net fluxes of the 
reactions in *JIq equal to the elements of a vector in the 
null space of the stoichiometry matrix So, is given by 

a{b)=AVk B \n FT ( r -^zl\ f for all b e null(So), (7) 

meM 0 V ^ ' 

where A = 6.02214084(18) x 10 23 mol _1 is the Avoga- 
dro number, V is the system volume, and h B = 
1.3806504 x 10" 23 JK _1 is the Boltzmann constant. 
According to the second law of thermodynamics, the 
entropy production rate must always be greater than or 
equal to zero, with equality if and only if the system is 
at thermodynamic equilibrium. It is therefore clear from 
(7) that the Wegscheider conditions imply that the 
entropy production rate must be zero in this case (i.e., 
the system must be at thermodynamic equilibrium). As 
a consequence, the chemical motive force (which is the 
amount of energy added to the system per unit time 
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due to mass exchange through its boundary) and the heat 
dissipation rate must also be zero. This makes intuitive 
sense, since a reaction cycle leaves all molecular concen- 
trations unchanged and, therefore, there is no change in 
internal energy or mass flow through the system bound- 
ary. Clearly, we can think of the Wegscheider conditions 
as being a direct consequence of the thermodynamic 
requirement that a (b) = 0, for every b g null (So). 

Linear constraints 

Unfortunately, (6) imposes a possibly infinite number 
of non-linear constraints on the rate constants of a 
closed biochemical reaction system. However, it is suf- 
ficient to satisfy (6) for M 2 = M 0 - Mi basis vectors 
{b {i \ i = 1, 2,..., M 2 } of the null space of So where M 0 
is the number of reactions and Mi = rank (So) (see 
Additional file 1). By using this observation and by 
taking logarithms on both sides of (6), we obtain the 
following linear constraints on the log-rate constants 



E 

meM 0 



bm(Q2m 



Qlm-l) = 0, fori = 1,2, 



(8) 



where frW is the m th component of the i th basis vector 
b^ l \ In Additional file 1 we derive an analytical formula 
for the basis vectors of the null space of So [see Equa- 
tion (S.l)]. As a consequence, the possibly infinite non- 
linear Wegscheider conditions given by (6) are equiva- 
lent to much more manageable finite linear constraints 
on the log-values of the parameters of the closed subsys- 
tem, given by 



Wk = 0, 



(9) 



where k := ln(k) and W is an M 2 x / matrix that can 
be easily constructed from knowledge of So using (8). 
Hence, k g W if and only if W ln(k) = 0. 

We should note that there might be additional linear 
constraints that we may wish to impose on the loga- 
rithms of the kinetic parameters of a biochemical reac- 
tion system. Here are some examples: 

♦ By using an appropriate experimental procedure, 
we may be able to accurately measure the equili- 
brium constant R m of the m th reversible reaction. 
For a (generalized) mass-action reaction, we have 
R m := r 2m -i/r 2m and thus we obtain a linear constraint 

Q2m-\ - Qim = 4T eas) on tne rate constants of tne 
m th reaction, where 4™ eas ) is the log value of the 
measured equilibrium constant. 

♦ For a reversible Michaelis-Menten reaction, the 
Haldane relationship implies a linear constraint 
between the logarithms of the kinetic parameters of 



a reversible Michaelis-Menten reaction and 

(measjnyl 

♦ By using experimental techniques, such as plasmon 
resonance or atomic force microscopy, it may be pos- 
sible to obtain a highly accurate measurement fcj meas ) 
of an individual kinetic parameter kj. In this case, we 
must impose the (trivial) constraint ejic = In [fej meas ^ ], 
where e y is the / h column of the / x / identity matrix. 

♦ To reduce the dimensionality of parameter estima- 
tion, we may employ a sensitivity analysis approach, 
such as the one proposed in [18,19], to identify para- 
meters that do not appreciably influence the cost of 
estimation. Determining accurate values for these 
parameters is inconsequential to the behavior of the 
biochemical reaction system at hand. Therefore, we 

can fix these parameters to some nominal values fcj 0 ^ 
throughout model calibration, resulting again in lin- 
ear constraints of the form ejic = In [k^ ]. 

♦ We may want to expand an existing (validated) 
thermodynamically feasible model to include addi- 
tional reactions and molecular species. We can do 
this by fixing the parameters of the existing model 

using linear constraints ejic = In [fej exlst ^], where k^ exlst ^ 
is the / h parameter value of the existing model. Then 
estimation takes place only on the parameters asso- 
ciated with the new reactions. 

For a given biochemical reaction system, we can com- 
bine all possible linear constraints on the logarithms k 
of the kinetic parameters k into a single matrix equation 
of the form: 



Ak = c, 



(10) 



where A is an appropriately constructed L x / matrix 
and c is an L x 1 vector of known values determined by 
the constraints, with L being the number of constraints. 
Note that if there are no constraints other than the 
Wegscheider conditions, we would simply have A = W 
and c = 0. 

Model calibration 

We will now assume that we have obtained noisy mea- 
surements y = {y^ p) (^), n e M s , p e V, q e Q] of the 
concentration dynamics of selected molecular species 
n G Af s c J\f in an open biochemical reaction system of 
known stoichiometry, obtained by a set of distinct 
experiments peV = {l,2, . . . , P} at discrete time 
points{^, q e Q}, Q = {1, 2, . . . , Q}, within the obser- 
vation time window T< The problem of model calibra- 
tion we consider in this paper is to determine 
thermodynamically consistent values for the kinetic 
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parameters {kj, j e J}, so that the concentration 
dynamics x = {x$\t q ), n e M Sl p e V, q e Q} produced 
by the estimated system match y in some well-defined 
sense. Note that, for a given p e V, the dynamics 
{Xn\t) f n e AT] are computed via (1) using initial con- 
ditions Xn\o) = c$ that correspond to the p th experi- 
mental condition. Data y may be obtained by 
appropriately designed in vivo or in vitro experiments, 
or by simulating an established and experimentally 
validated biochemical reaction model whose kinetic 
parameters are thermodynamically infeasible. Instead of 
focusing on the quality of the estimated values of the 
kinetic parameters, it has been argued in [20] that 
matching predicted and experimental observations of 
molecular concentrations is the right thing to do due 
to the 'sloppiness' of biochemical reaction systems (dif- 
ferent combinations of parameter values may essen- 
tially lead to the same concentration dynamics). 

There are many estimation techniques we can use to 
address the previous problem, such as maximum likeli- 
hood or Bayesian inference. The final product of these 
techniques is a cost function C(k \y) used to quantify 
the overall error between the predicted concentration 
measurements x, obtained by simulating the biochemical 
reaction system with kinetic parameter values k = exp 
{k}, and the noisy observations y. In an effort to reduce 
the typically large dynamic range of kinetic parameter 
values, it is customary to estimate the log values k 
instead of k. As a consequence, the problem of interest 
here is to compute an estimate £ of the log kinetic para- 
meters k, so that 

k = argminC(K|y), m) 

keA v 7 

where A is the set of all ks satisfying the linear con- 
straints given by (10); i.e. ,A := {k : A/c = c}. For simpli- 
city, we consider in this paper the least-squares error 
cost criterion, given by 

c(*ly) = E E E b# } W " xn\t q )]. (12) 

ueM $ pEP c\eQ 

This error criterion is a consequence of a maximum 
likelihood approach to parameter estimation under the 
assumption of normally distributed observation errors. 
Note that the cost C depends on the log kinetic para- 
meters k through the molecular concentrations Xn\tA 

In this paper, we refer to the constrained optimization 
problem given by (11) as Thermodynamically Consistent 
Model Calibration (TCMC). The importance of TCMC 
lies on the formulation of the model calibration problem 
as one of constrained optimization via (11), with con- 
straints that ensure at least the thermodynamic feasibility 



of the resulting model. A useful observation is that TCMC 
is agnostic to the choice of the cost function used and the 
algorithm employed for optimization. Moreover, we can 
easily transform the constrained optimization problem 
given by (11) to a standard unconstrained problem. 
Indeed, a well known result of linear algebra implies that 
A := {k : Ak = c} = {k : k = k 0 + Bv, v e ffi d }> where 
d = J — rank (A), k 0 is a / x 1 vector that satisfies 
Ak 0 = c, B is a / x d matrix whose columns form a basis 
for the null space of matrix A, and y\ d is the space of all d 
x 1 real valued vectors. Thus, if we can find a particular 
solution k 0 to the constraints Ak = c, we can reformulate 
the constrained optimization problem given by (11) as the 
following lower dimensional unconstrained problem: 

k = Kq + Bv 

. n , , , (13) 
v = argmmCo [v \y), 

where 

C 0 {v\y):=C{k 0+ Bv\y). (14) 

Note that we assume here that there is more than one 
solution to Ak = c- If only one solution exists, optimiza- 
tion is not necessary, since this solution will be our 
parameter estimate. On the other hand, if Ak = c has no 
solution, then we cannot find a k that will simulta- 
neously satisfy all necessary constraints, indicating that 
we must reformulate the problem. 

The objective function C 0 is non-convex with possibly 
many local minima. As a consequence, a gradient-based 
optimization algorithm for solving (13) may prematurely 
terminate at a local minimum with much larger cost 
than the globally minimum cost. To ameliorate this pro- 
blem, we have decided in this paper to use a stochastic 
optimization algorithm, namely simulated annealing 
(SA) [21]. Stochastic optimization algorithms can move 
away from premature local minima, thus resulting in 
better solutions to optimization problems than when 
using deterministic techniques. Although many choices 
exist for optimization, such as the simultaneous pertur- 
bation stochastic approximation (SPSA) method 
employed in our previous work [11] and genetic algo- 
rithms, SA is a stochastic search optimization algorithm 
that enjoys several advantages over other algorithms. In 
particular, the most important features of SA are ease of 
implementation and the ability to avoid premature con- 
vergence by jumping away from local minima en route 
to finding a global minimum. In SA, a new value of v is 
proposed nearby the current value. The proposed value 
becomes the new value with a certain probability based 
on cost improvement. If the proposed value is not 
accepted, then the current value is used. The proposed 
value is usually drawn from an appropriately chosen 
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probability distribution around the current value (e.g., a 
Gaussian distribution centered at the current value). See 
Additional file 1 for a detailed description of the SA 
algorithm used. 

A natural question that arises here is whether different 
choices for k 0 and E> affect the final result of optimiza- 
tion. If we had an algorithm that could always find the 
global solution to a non-convex optimization problem, 
then the choice of k 0 and E would have no effect on the 
solution. Since however global minima are difficult to 
find, we expect that different choices for k 0 and B will 
have some impact on the final solution. Note that it 
would be advantageous to choose ft 0 as close as possible 
to the globally optimal solution. We attempt to do so in 
our subsequent example by taking k 0 to be a solution to 
Akq = c that is closest (in the least-squares sense) to 
published values. On the other hand, we expect that the 
choice of E> will have only a minor effect on optimiza- 
tion, since different matrices E> amount to scaling or 
rotating the axes of the parameter space being searched. 
Good optimization algorithms, such as SA, are expected 
to be robust to such alterations. 

Example 

We now demonstrate the proposed TCMC method by 
re-estimating the kinetic parameters of a classical model 
of the EGF/ERK signaling cascade [13]. This model con- 
sists of three compartments (extracellular space, cyto- 
plasm, and endosomal volume), N = 100 biochemical 
species, and M = 125 reactions. Moreover, it is charac- 
terized by 90 different kinetic parameter values, a num- 
ber that is smaller than the total number of individual 
reactions, due to the fact that some reactions share the 
same kinetic parameters, whereas, other reactions are 
not associated with any parameters. 

Although the Schoeberl model has provided valuable 
insights into the biological mechanisms underlying EGF/ 
ERK signaling, the values of the kinetic parameters pub- 
lished in the literature are thermodynamically infeasible. 
As a consequence, the concentration dynamics produced 
by the published model are physically impossible and 
could not occur in nature. By using TCMC to recom- 
pute thermodynamically feasible values for the kinetic 
parameters, we can construct a physically realizable 
model whose dynamics are expected to reflect the true 
behavior of EGF/ERK signaling more accurately than 
the dynamics produced by the published model. 

We use the version of the Schoeberl model published 
in the BioModels database [22]. Moreover, we employ 
the same experimental time series data of ERK-PP activ- 
ity used for creating the original model [13]. To simplify 
implementation of TCMC, we assume here that the 
Schoeberl model is characterized by / = 2M = 250 
kinetic parameters (two parameters per reaction). For an 



irreversible reaction, we constrain the rate constant of 
the reverse reaction to be equal to zero. For those reac- 
tions not associated with any kinetic parameters, we 
assign two artificial parameters (one for the forward and 
one for the reverse reaction) and constrain both their 
values to be zero. 

We implement the following TCMC procedure to re- 
estimate the values of the kinetic parameters in a ther- 
modynamically consistent manner. First, we find the 
closed subsystem of the Schoeberl model (see Additional 
file 1). This subsystem consists of N 0 = 93 molecular 
species and M 0 = 83 reversible elementary (monomole- 
cular and bimolecular) reactions. Then, we determine 
the thermodynamic constraints by using the 93 x 83 
stoichiometry matrix So of the closed subsystem. It 
turns out that the rank of the stoichiometry matrix So is 
65. As a consequence, the closed subsystem contains M 2 
= 83 - 65 = 18 independent reaction cycles, determined 
by the columns of matrix B 0 , given by Equation (S.l) of 
Additional file 1 (see the file for details on the reaction 
cycles). Therefore, the rate constants of the closed sub- 
system must satisfy 18 independent Wegscheider condi- 
tions. It turns out that the published Schoeberl model 
satisfies only 10 of these conditions. As a matter of fact, 
the entropy reaction rates, given by (7), associated with 
5 independent reaction cycles are negative, in direct vio- 
lation of the second law of thermodynamics, whereas, 
the entropy reaction rates of 3 reaction cycles are posi- 
tive, with the remaining being equal to zero (see Table 
S.2 in Additional file 1). 

Next, we construct matrix A and vector c by combin- 
ing the 18 thermodynamic constraints with 167 linear 
equality constraints originally built into the model that 
relate various parameters across reactions (see Addi- 
tional file 1 for details on these constraints), thus produ- 
cing the linear constraints Ak = c. In this case, A is a 185 
x 250 matrix, whereas, c is a 250 x 1 vector with ele- 
ments 0 or -oo (this corresponds to kinetic parameters 
whose values are fixed to zero). It turns out that rank 
(A) = 171, which implies that the dimension of the null 
space of A is d = 79. Subsequently, we find a basis for 
the null space of A and form the 250 x 79 matrix E>. 
Moreover, we find a particular solution k 0 of Ak = c 
that is closest, in the least-squares sense, among all 
other solutions to the published thermodynamically 
infeasible parameter values. We accomplish this by 
using a well-known approach for solving constrained 
least-squares problems [23]. Finally, by using (12) on the 
aforementioned experimental time series data and SA, 
we calculate the 79 x 1 vector y by minimizing the cost 
function C 0 (v |y), given by (14), and set k = ic 0 + Bv. 

We take the thermodynamically feasible log kinetic 
parameter values k 0 to be close to the published kinetic 
parameter values, since the published values already 
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produce a good match between the experimentally avail- 
able and predicted molecular dynamics. In this case, 
TCMC provides a thermodynamically consistent correc- 
tion of k 0 , by means of Ev> that reduces the cost of esti- 
mation, thus further improving the match between the 
experimentally obtained and predicted dynamics. 

In Figure 1, we depict the concentration dynamics of 
ERK-PP obtained by the published model (dotted curves), 
for two different input EGF concentrations, namely 50 
ng/mL and 5 ng/mL, whereas, in Figure SI of Additional 
file 1 we depict the concentration dynamics of ERK-PP 
obtained by the published model for three additional con- 
centrations of input EGF, namely 0.5 ng/mL, 0.125 ng/ 
mL, and 0.0625 ng/mL. Clearly, the resulting dynamics 
match the available densitometric data (indicated by the 
circles) rather poorly. 

We should note here that the ERK-PP dynamics ori- 
ginally published in [13] seem to match the available 
data pretty well - see Figure 2F in [13]. However, the 
model detailed in the original publication cannot be 
used to reproduce the results. On the other hand, the 
model available in the BioModels database does not 
reproduce the results published in the Schoeberl paper 
but produces dynamics that are very similar to the ones 
reported in that paper. 

The solid curves in Figure 1 and Figure SI of Addi- 
tional file 1 depict the ERK-PP concentration dynamics 
estimated by TCMC. TCMC results in a thermodynami- 
cally consistent model of the EGF/ERK signalling cas- 
cade that produces ERK-PP concentration dynamics 
which match the available experimental data noticeably 
better than the dynamics produced by the published 



model. As a matter of fact, model fit between predicted 
and experimentally measured ERK-PP concentration 
dynamics, quantified by the least-squares error given by 
(12), is reduced by 69%. TCMC simultaneously adjusts 
the values of the kinetic parameters in order to mini- 
mize the cost of fitting the ERK-PP response to the 
available densitometric data. This 'collective fitting' 
strategy has been recognized in the literature [20] as 
being more desirable than constructing a biochemical 
reaction system model from individual parameter esti- 
mates in a piecewise fashion, which is the case with the 
published Schoeberl model. 

To separate the effect of collective fitting versus impos- 
ing the underlying thermodynamic constraints on the 
kinetic parameters, we use the same simulated annealing 
algorithm employed by TCMC to estimate the kinetic 
parameters without including the thermodynamic con- 
straints. The resulting (thermodynamically infeasible) esti- 
mated dynamics are depicted by the dashed curves in 
Figure 1 and Figure SI of Additional file 1. As expected, 
these dynamics fit the densitometric data better than the 
dynamics obtained by the published model. As a matter 
of fact, model fit between predicted and experimentally 
measured ERK-PP concentration dynamics is reduced by 
70% in this case, which is slightly better than the one 
obtained by TCMC. It will shortly become clear however 
that the solution obtained without imposing the thermo- 
dynamic constraints leads to unrealistic system behavior. 
In the following, we discuss a number of advantages 
gained by TCMC over estimating the kinetic parameters 
using a collective fitting approach that does not consider 
the underlying thermodynamic constraints. 
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Figure 1 ERK-PP concentration dynamics, measured in mol/m 3 , under two different input EGF concentrations. The Additional file 1 
reports additional dynamics. 
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Discussion 

Qualitative/quantitative value of thermodynamic 
consistency 

Due to lack of thermodynamic consistency in the para- 
meter values of the published Schoeberl model, the mole- 
cular dynamics produced by this model cannot possibly 
occur in nature. Because the values estimated by the pro- 
posed TCMC method satisfy all necessary thermody- 
namic constraints, it is expected that the resulting TCMC 
model will provide a more accurate representation of 
EGF/ERK signaling than the published Schoeberl model 

To provide an example of a potentially important dif- 
ference between the published model and the TCMC 
model calculated in this paper, we consider the inte- 
grated response of ERK-PP activity (i.e., the cumulative 
ERK-PP concentration). It has been argued in the litera- 
ture that the integrated response provides an appropri- 
ate metric for quantifying dependence of DNA synthesis 
on ERK activation in certain cells [24]. As a conse- 
quence, this feature of the ERK-PP concentration 
dynamics can influence a number of diverse biologically 
outcomes, such as cell cycle progression, cell prolifera- 
tion, and cell differentiation. 

In view of the fact that differences in the integrated 
response of ERK-PP activity may cause distinct biologi- 
cal outcomes, it is reasonable to believe that a key 
objective of EGF/ERK signaling is to maintain robust 
integrated response to changes in input EGF concentra- 
tion while producing a quick and sharp 'switch-like' 
transition between states of differing biological out- 
comes. In Figure 2, we provide a log-log plot of the inte- 
grated response of ERK-PP activity predicted by the 
published (dotted curve) and TCMC (solid curve) mod- 
els, for a wide range of input EGF concentrations. 
Although the integrated response predicted by both 
models is indeed robust for input EGF concentrations 
larger than 10" 2 ng/mL, when the EGF concentration 
decreases below 10" 2 ng/mL, the integrated response of 
ERK-PP activity predicted by the TCMC model exhibits 
a sharper transition from large to small values than the 
one predicted by the published model. As a matter of 
fact, the TCMC model predicts seven orders of magni- 
tude decrease in the integrated response, when the 
input EGF concentration decreases from 10" 2 ng/mL to 
10" 3 ng/mL, whereas, the published model predicts only 
four orders of magnitude decrease. By considering the 
discussion in [24], we believe that this behavior by the 
TCMC model may turn out to be a biologically impor- 
tant property of the EGF/ERK signaling pathway that 
cannot be effectively captured by the published model. 

We will now show that the TCMC model may result 
in a biologically plausible prediction of ERK-PP activity 
that can also be different than the one produced by the 
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Figure 2 Integrated response of ERK-PP activity, measured in 
mol • min/m 3 , as a function of input EGF concentrations. 



estimated thermodynamically infeasible model using col- 
lective fitting. In Figure 3, we show the long-term beha- 
vior of the estimated thermodynamically infeasible 
model. Under certain normal biological conditions (e.g., 
typical EGF receptor concentration, such as the one 
considered in our paper), ERK-PP activity is expected to 
decay to zero at steady-state [25]. However, one can see 
from Figure 3 that, even after 4 hours, the concentration 
of ERK-PP predicted by the estimated thermodynami- 
cally infeasible model is steadily increasing - possibly 
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Figure 3 Long-term ERK-PP concentration dynamics, measured 
in mol/m 3 , predicted by the estimated thermodynamically 
infeasible model (dashed curves) and the TCMC model (solid 
curves) under 0.125 ng/mL (blue curves) and 0.0625 ng/mL 
(red curves) input EGF concentrations. The inset shows the short- 
term behavior predicted by the two models as well as the 
corresponding densitometric data (blue and red circles). 
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being driven by the chemical perpetuum mobiles that 
occur when the Wegscheider conditions are violated. 
This unrealistic behavior appears despite the fact that 
the transient dynamics, depicted in the inset of Figure 3 
by the dashed curves, fit the data better than the TCMC 
dynamics, denoted by the solid curves. Such a sustained 
and long lasting response may lead to different biologi- 
cal outcomes than the ones resulting from ERK-PP 
activity that decays to zero at steady-state [25]. Thus, 
the estimated thermodynamically infeasible model can 
lead to erroneous biological predictions, despite its rea- 
sonable fit to the available densitometric data. 

Our previous examples show that thermodynamic 
consistency may result in model behavior that is differ- 
ent than the one predicted by thermodynamically infea- 
sible models of cellular function. However, more 
research is needed to experimentally validate observed 
differences and demonstrate that lack of thermodynamic 
consistency may indeed result in inaccurate (or even 
false) biological predictions. 

Flux analysis 

Flux-based analysis of biochemical reaction systems is a 
widely used method for understanding the principles 
underlying the production and regulation of mass flow 
in cellular systems, such as signaling or metabolic path- 
ways. It turns-out that the Wegscheider conditions, 
given by (6), constrain the reaction fluxes. If flux analy- 
sis does not take into account these constraints, then it 
may lead to inaccurate or misleading conclusions about 
the behavior and properties of mass flow in biochemical 
reaction systems. 

If {b^ l \ i = 1, 2,..., M 2 } is a set of basis vectors of the 
null space of the stoichiometry matrix So of the closed 
subsystem, and 0^(f, k), 0~(f, k) are respectively the 
forward and reverse fluxes of the m th reaction at time t, 
then (see Additional file 1 and [6]) 



E 

meM-o 



4>m[t, k) 

fori =1,2, . 



(15) 



teT, 



where frW is the m th element of the basis vector 
Note that this equation must be satisfied at each time 
point t eT> even far away from equilibrium. Moreover, 
it is satisfied for any vector b in the null space of So 

Since TCMC always leads to a thermodynamically fea- 
sible biochemical reaction system with parameters satis- 
fying the Wegscheider conditions, the flux constraints 
imposed by (15) are satisfied as well. Thus, thermodyna- 
mically consistent flux analysis can be performed on the 
resulting system without any additional considerations, 
and the behavior of the system is always physically 
realizable. 



Bias-variance tradeoff and overfitting 

In addition to the previous advantages, there is an 
important statistical benefit for thermodynamically con- 
straining the parameters of a biochemical reaction sys- 
tem. By searching for kinetic parameter values within a 
thermodynamically consistent subset of the parameter 
space, we may reduce the variance of estimation and 
thus lower the estimation error through the well-known 
bias-variance tradeoff. 

The mean-square error (MSE) E[[C(£|y) - C(Ar true |y)] 2 ] 
in cost, where £ is an estimator of the 'true' parameters 
ft tme , satisfies: 

E[[C(£|y)-C(/wly)] 2 ] 



MSE 



C(K tme |y)-E[C(£|y)] 2 



Bias 



+ E[[C(K|y)n-E[C(£|y)]] 2 



Variance 

Generally speaking, imposing constraints on the para- 
meters may increase the bias term but decrease the var- 
iance. However, since the true parameter values must 
satisfy the thermodynamic constraints, we expect a 
decrease in variance without an increase in bias. As a 
consequence, searching for parameter values within the 
thermodynamically consistent subspace of the parameter 
space may lead to a lower mean square error in cost 
due to smaller variance. Since the volume of a search 
space grows exponentially in the dimension of the 
space, gains in variance (and hence improvements in the 
mean square error) are expected to be large. 

A related statistical notion in estimation problems is 
data overfitting. Overfitting refers to situations where 
model complexity (i.e., number of parameters) is high 
and the amount and quality of available data is com- 
paratively low. In this case, it is often possible that we 
match the data so well that, in addition to matching the 
underlying physical phenomena of interest, the model 
fits the measurement noise as well. This situation 
reduces the predictive power of the estimated model. 
The relationship of overfitting to the bias-variance tra- 
deoff is clear: complex models are characterized by low 
bias and are able to describe a wide range of phenom- 
ena but suffer from high variance in parameter estima- 
tion (i.e., different data sets may lead to wildly different 
parameter estimates via overfitting). 

Most often, the behavior of biochemical reaction 
systems is only influenced by a small number of para- 
meters (due to robustness of such systems to the under- 
lying kinetic parameters). This reduces the actual 
number of parameters that must be estimated with 
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precision. Moreover, the thermodynamic constraints 
further reduce the number of parameters to be esti- 
mated, alleviating some overfitting concerns. Imposing 
additional parameter constraints, such as the ones 
employed by the Schoeberl model, may further be used 
to combat this issue. Unfortunately, model complexity is 
much higher than the amount and quality of available 
data in most problems of systems biology and overfitting 
remains a major concern even when using TCMC. In 
the example considered in this paper, time series data is 
only available for one crucial chemical species. As a 
consequence, it is natural to expect that the dynamics 
produced by the TCMC model overfit the available data 
to a certain extent. Thus, when more experimental data 
become available, TCMC must be rerun in order to pro- 
duce a better calibration of the model, with a new cost 
function that includes the additional data. 

In light of these concerns, some may argue that col- 
lective fitting of model parameters is not the correct 
approach, and that a reductionist approach is more 
appropriate (i.e., attempting to measure parameters indi- 
vidually and then combine the results to determine an 
appropriate model calibration). Unfortunately, the 
reductionist approach is time consuming, extremely 
expensive, and, in most cases, impossible with current 
experimental techniques. Moreover, incorrect imple- 
mentation of a reductionist approach may lead to a 
thermodynamically infeasible model calibration. This is 
clearly the case with the Schoeberl model (and most 
probably with other models published in the literature). 

TCMC is a collective fitting procedure, but offers a prag- 
matic compromise to the reductionist approach. In light of 
the fact that some parameters may be measured individu- 
ally with extreme precision, TCMC allows for these para- 
meters to be fixed to their measured values using matrix 
A. In addition, a more advanced Bayesian cost function 
can be used (e.g., see [11]) to factor in prior experimental 
knowledge when parameters have been previously esti- 
mated with less precise experimental techniques. 

Computational advantages 

According to the 'curse of dimensionality/ which refers 
to an exponential increase in the volume of the para- 
meter space as its dimension grows, estimation becomes 
substantially harder in high dimensional spaces. A 
'naive' search of that space, in an effort to find the 'true' 
parameter values of a biochemical reaction system 
[assuming that these values minimize the cost function 
given by (12)], is hopeless. As a matter of fact, the prob- 
ability of obtaining parameter values that satisfy the 
Wegscheider conditions and other underlying log-linear 
constraints by uniformly sampling the entire parameter 
space (which is an 'easier' problem than finding the 
'true' parameter values) is zero. As a consequence, the 



constraints must be explicitly considered by the optimi- 
zation problem at hand to have any hope of successfully 
solving the problem of model calibration. 

As a matter of fact, since the feasible manifold is of 
lower dimension than the entire parameter space, meth- 
ods that do not consider the underlying thermodynamic 
and non-thermodynamic constraints will spend most 
time searching the immense infeasible portions of the 
parameter space. The imposition of constraints among 
the kinetic parameters of a biochemical reaction system 
reduces the dimensionality of the parameter space to a 
smaller feasible region and make parameter estimation 
computationally easier. TCMC makes this explicit, by 
performing optimization over a lower dimensional 
space, spanned by the lower dimensional vectors v, 
instead of the entire parameter space, spanned by the 
higher dimensional vectors k. 

Conclusions 

For a biochemical reaction system to be physically realiz- 
able, it is required that the underlying kinetic parameters 
satisfy certain thermodynamic constraints, known as 
Wegscheider conditions. This issue has been largely 
ignored in the literature, as evidenced by the fact that 
many published models violate these constraints. The 
model calibration method we have proposed in this 
paper can be effectively used to determine thermodyna- 
mically consistent values for the kinetic parameters of 
any set of reactions in an ideal homogeneous mixture at 
constant temperature and volume. Our method is simple 
to understand and implement. Moreover, it can be easily 
incorporated into any existing or newly proposed calibra- 
tion technique in order to make sure that the resulting 
model satisfies the fundamental laws of thermodynamics 
as well as other desirable conditions and constraints. 

There are two major issues associated with calibrating 
biochemical reaction systems: 

1. The quality and quantity of available data are 
inadequate to allow sufficient estimation of all 
underlying parameter values. 

2. Biochemical reaction models contain many para- 
meters whose numbers dramatically increase with 
model size and detail. As a consequence, the curse 
of dimensionality seriously hampers estimation 
algorithms. 

The first issue is primarily associated with current lim- 
itations of experimental methods and approaches. To 
address this issue, we need substantial improvements in 
experimental equipment and methodologies. However, 
TCMC scales well with future improvements in data 
quality and quantity. Matrix A can handle arbitrary para- 
meter measurements (equilibrium constants, Haldane 
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relations, direct kinetic parameter measurements, etc.). 
Moreover, TCMC can employ any cost function of 
choice, so additional concentration data can be incorpo- 
rated seamlessly therein. 

The second issue is the largest obstacle facing model 
calibration techniques. To reduce dimensionality, we 
must attempt to exploit mathematical structure particu- 
lar to the biological problem at hand. TCMC attempts 
to address this problem in two ways: 

♦ First, TCMC uses the fact that there are funda- 
mental physical principles underlying biochemical 
reaction systems that may constrain the set of possi- 
ble kinetic parameter values. As a consequence of 
the fundamental laws of thermodynamics, most 
complex biochemical networks contain reaction 
cycles that constrain the kinetic parameters accord- 
ing to (9). These constraints allow TCMC to reduce 
dimensionality by restricting the estimation problem 
on a smaller thermodynamically feasible subset of 
the parameter space. 

♦ Second, experimental data and mathematical analy- 
sis can often provide other forms of constraints on 
the underlying parameters (e.g., through directly 
measuring rate or equilibrium constants, by deter- 
mining Haldane relationships between enzymatic 
parameters, etc.). In particular, sensitivity analysis 
may reveal non-influential kinetic parameters that 
can be set to some nominal values without appreci- 
ably affecting system behavior. All these additional 
constraints can be accounted for by the A matrix in 
(10), further reducing the dimensionality of the 
ensuing parameter estimation problem. 

As we mentioned before, dimensionality reduction is 
made explicit by TCMC, since optimization takes place 
over a smaller dimensional vector v, instead of the 
higher dimensional vector k specified by the model. As 
a consequence, TCMC does not entirely rely on finding 
the globally optimal parameter values that best fit avail- 
able dynamic data of molecular concentration. Imposing 
thermodynamic (and other log-linear) constraints allows 
TCMC to restrict its search for appropriate parameter 
values over a smaller subspace of the entire parameter 
space in order to reach a compromise between optimal 
data fit and biophysical feasibility. It has been recently 
pointed out in the literature that this approach to para- 
meter estimation should be considered as an important 
part of determining the parameter values of complex 
biological models [26]. 

Recently, a method has been proposed in the literature 
for inferring a complete and consistent set of kinetic 



parameter values from incomplete and inconsistent data 
[27]. This method, known as 'parameter balancing,' 
employs a Bayesian estimation approach based entirely 
on published data pertaining the values of the underly- 
ing kinetic parameters. Although parameter balancing 
can be used to provide thermodynamically consistent 
values for the kinetic parameters of a biochemical reac- 
tion system, the method does not include quantitative 
dynamic measurements of molecular concentrations. As 
a consequence, parameter balancing may result in a 
thermodynamically feasible biochemical reaction model 
that does not adequately predict experimental observa- 
tions of dynamic system behavior. Future research may 
focus on combining TCMC with parameter balancing to 
utilize published parameter sets as well as dynamic 
experimental data. 

A problem that we have not addressed in this paper is 
the influence of ions, such as K + , and Ca 2+ , and certain 
environmental factors, such as the temperature and pH, 
on the thermodynamic behavior of a biochemical reac- 
tion system [28]. Our objective in this paper is not to 
address biochemical reaction systems with this level of 
complexity, but to focus on the widely reported simpler 
models of cellular function that consider only interac- 
tions among biochemical reactants in a fixed environ- 
ment. Note, however, that temperature and pH 
dependence of parameters has been accounted for in 
[27] via parameter balancing using log-linear equations 
between biochemical parameters. Since arbitrary log-lin- 
ear constraints between parameters can be enforced by 
TCMC via (10), we suspect that TCMC can be used 
directly or appropriately modified to handle additional 
biochemical complexities that have not been addressed 
in this paper. 

Another problem that we have not addressed here is 
constructing new biochemical reaction models of cellu- 
lar function. Since, in this paper, we only address the 
model calibration problem, we take (1) as given and 
proceed to determine the parameter vector k from data. 
In general, determining the structure (i.e., the stoichio- 
metry) of a biochemical reaction network is an extre- 
mely laborious task. Preliminary work indicates that 
thermodynamics can also play a key role in estimating 
the structural complexity of biochemical reaction sys- 
tems [29]. Future scientific investigations are necessary 
to further examine this open problem. 

Additional material 

r -\ 

Additional file 1: In this document, we provide supplementary 
mathematical and computational details required to fully 
understand the material presented in the Main Text. 
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